Skip to content

Conversation

@sjakobi
Copy link
Member

@sjakobi sjakobi commented Nov 2, 2025

...and define differenceWith via differenceWithKey.

Closes #389, closes #364.

...and define `differenceWith` via `differenceWithKey`

Closes #389.
This makes the overlapping case significantly faster.
@sjakobi sjakobi force-pushed the sjakobi/issue389-dWK branch from 5539624 to 261ba6d Compare November 5, 2025 21:17
@sjakobi sjakobi marked this pull request as ready for review November 5, 2025 21:25
differenceWith f = differenceWithKey (const f)
{-# INLINE differenceWith #-}

-- | \(O(n \log m)\) Difference with a combining function. When two equal keys are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are m and n here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n is the size of the first map. m is the size of the second map. This is a convention this package uses for many functions. I suspect it was adopted from containers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but I don't think that's necessarily what this implementation does; it was left unchanged from the old one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obviously wrong to me at least. If the first map is small, the second is a relatively large superset, and lookup[Cont] takes log(m), we still do n lookups in the larger map. To be fair, we don't start these lookups at the root, so maybe O(n log (m/n)) would be more accurate?!

IMHO these log(size)s are not very useful anyways, since on 64-bit systems we have a maximum tree height of 13, and on 32-bit systems the maximum tree height is 8; and you can still have a map with two entries and full tree height…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds are given assuming sufficiently uniform hashing, but that's not at all the case for important instances like Int. It's ... a problem. I can't say if n log (m/n) is accurate or not, but it should be something symmetrical!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take that back. Maybe not symmetrical. But ... I dunno...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not symmetrical actually. I have no idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #543 to track this.

@sjakobi sjakobi merged commit a7736a1 into master Nov 10, 2025
9 checks passed
@sjakobi sjakobi deleted the sjakobi/issue389-dWK branch November 10, 2025 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a differenceWithKey function difference and differenceWith could be much faster

3 participants